%===================================================================================
\section{Parallel example problems}\label{s:ex_parallel}
%===================================================================================

\subsection{A user preconditioner example: idaHeat2D\_kry\_p}\label{ss:idaHeat2D_p}

As an example of using {\ida} with the parallel MPI {\nvecp} module and the Krylov 
linear solver {\sunlinsolspgmr} with user-defined preconditioner, we provide the example
\id{idaHeat2D\_kry\_p} which solves the same 2-D heat PDE problem as \id{idaHeat2D\_kry}. 

In the parallel setting, we can think of the processors as being laid out
in a grid of size \id{NPEX} $\times$ \id{NPEY}, with each processor computing
a subset of the solution vector on a submesh of size \id{MXSUB} $\times$
\id{MYSUB}.  As a consequence, the computation of the residual
vector requires that each processor exchange boundary information
(namely the components at all interior subgrid boundaries) with its
neighboring processors.  The message-passing (implemented in the
function \id{rescomm}) uses blocking sends, non-blocking receives, and
receive-waiting, in routines \id{BSend}, \id{BRecvPost}, and
\id{BRecvWait}, respectively.  The data received from each neighboring
processor is then loaded into a work array, \id{uext}, which contains
this ghost cell data along with the local portion of the solution.

The local portion of the residual vector is then computed in the
routine \id{reslocal}, which assumes that all inter-processor
communication of data needed to calculate \id{rr} has already been
done.  Components at interior subgrid boundaries are assumed to be in
the work array \id{uext}.  The local portion of the solution vector
\id{uu} is first copied into \id{uext}.  The diffusion terms are
evaluated in terms of the \id{uext} array, and the residuals are
formed.  The zero Dirichlet boundary conditions are handled here by
including the boundary components in the residual, giving algebraic
equations for the discrete boundary conditions.

The preconditioner (implemented in \id{PsetupHeat} and \id{PsolveHeat}) uses the
diagonal elements of the Jacobian only and therefore involves only local calculations.

The \id{idaHeat2D\_kry\_p} main program begins with MPI calls to initialize MPI and to
set multi-processor environment parameters \id{npes} (number of processes) and
\id{thispe} (local process index).  Then the local and global vector lengths are set,
the user-data structure \id{Userdata} is created and initialized, and \id{N\_Vector}
variables are created and initialized for the initial conditions (\id{uu} and
\id{up}), for constraints, for the vector \id{id} specifying the differential 
and algebraic components of the solution vector, and for the preconditioner
(\id{pp}).  As in \id{idaHeat2D\_kry}, constraints are passed to {\ida} through the
\id{N\_Vector} \id{constraints} and the function \id{IDASetConstraints}, with
all components set to $1.0$ to impose non-negativity on each solution component.
A temporary \id{N\_Vector} \id{res} is also created here, for use only in
\id{SetInitialProfiles}.  In addition, for illustration purposes,
\id{idaHeat2D\_kry\_p} also excludes the algebraic components of the solution
(specified through the \id{N\_Vector} \id{id}) from the error test by calling
\id{IDASetSuppressAlg} with a flag \id{SUNTRUE}.

Sample output from \id{idaHeat2D\_kry\_p} follows.
%%
\includeOutput{idaHeat2D\_kry\_p}{../../examples/ida/parallel/idaHeat2D_kry_p.out}
%%

%-----------------------------------------------------------------------------------

\subsection{An IDABBDPRE preconditioner example: idaFoodWeb\_kry\_bbd\_p}\label{ss:idaFoodWeb_bbd_p}

In this example, we solve the same food web problem as with
\id{idaFoodWeb\_bnd}, but in parallel and with the {\sunlinsolspgmr} linear solver and
using the {\idabbdpre} module, which generates and uses a band-block-diagonal 
preconditioner.

As with \id{idaHeat2D\_kry\_p}, we use a \id{NPEX} $\times$ \id{NPEY} processor grid, with
an \id{MXSUB} $\times$ \id{MYSUB} submesh on each processor.  Again, the residual
evaluation begins with the communication of ghost data (in \id{rescomm}),
followed by computation using an extended local array, \id{cext}, in the
\id{reslocal} routine.
The exterior Neumann boundary conditions are explicitly handled here
by copying data from the first interior mesh line to the ghost cell
locations in \id{cext}.  Then the reaction and diffusion terms are
evaluated in terms of the \id{cext} array, and the residuals are formed.

The Jacobian block on each processor is banded, and the half-bandwidths of
that block are both equal to $\id{NUM\_SPECIES}\cdot\id{MXSUB}$.
This is the value supplied as \id{mudq} and \id{mldq} in the call to
\id{IDABBDPrecInit}.  But in order to reduce storage and computation costs for
preconditioning, we supply the values \id{mukeep} = \id{mlkeep} = 2
(= \id{NUM\_SPECIES}) as the half-bandwidths of the retained band matrix blocks.
This means that the Jacobian elements are computed with a difference quotient
scheme using the true bandwidth of the block, but only a narrow band
matrix (bandwidth 5) is kept as the preconditioner.

The function \id{reslocal} is also passed to the {\idabbdpre} preconditioner
as the \id{Gres} argument, while a \id{NULL} pointer is passed for the \id{Gcomm}
argument (since all required communication for the evaluation of \id{Gres} was
already done for \id{resweb}).

In the \id{idaFoodWeb\_kry\_bbd\_p} main program, following MPI initializations and
creation of user data block \id{webdata} and \id{N\_Vector} variables,
the initial profiles are set, the {\ida} memory block is created, the
{\sunlinsolspgmr} linear solver is created and attached to the {\ida}
solver, and the {\idabbdpre} preconditioner is initialized.
The call to \id{IDACalcIC} corrects the initial values so that they are
consistent with the DAE algebraic constraints.  

In a loop over the desired output times, the main solver function \id{IDASolve}
is called, and selected solution components (at the bottom-left and top-right
corners of the computational domain) are collected on processor 0 and printed
to \id{stdout}. The main program ends by printing final solver statistics,
freeing memory, and finalizing MPI.

Sample output from \id{idaFoodWeb\_kry\_bbd\_p} follows.
%%
\includeOutput{idaFoodWeb\_kry\_bbd\_p}{../../examples/ida/parallel/idaFoodWeb_kry_bbd_p.out}
%%

